Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
This paper explores a multimodal approach for translating emotional cues present in speech, designed with Deaf and Hard-of-Hearing (dhh) individuals in mind. Prior work has focused on visual cues applied to captions, successfully conveying whether a speaker’s words have a negative or positive tone (valence), but with mixed results regarding the intensity (arousal) of these emotions. We propose a novel method using haptic feedback to communicate a speaker’s arousal levels through vibrations on a wrist-worn device. In a formative study with 16 dhh participants, we tested six haptic patterns and found that participants preferred single per-word vibrations at 75 Hz to encode arousal. In a follow-up study with 27 dhh participants, this pattern was paired with visual cues, and narrative engagement with audio-visual content was measured. Results indicate that combining haptics with visuals significantly increased engagement compared to a conventional captioning baseline and a visuals-only affective captioning style.more » « lessFree, publicly-accessible full text available April 25, 2026
-
Public sector leverages artificial intelligence (AI) to enhance the efficiency, transparency, and accountability of civic operations and public services. This includes initiatives such as predictive waste management, facial recognition for identification, and advanced tools in the criminal justice system. While public-sector AI can improve efficiency and accountability, it also has the potential to perpetuate biases, infringe on privacy, and marginalize vulnerable groups. Responsible AI (RAI) research aims to address these concerns by focusing on fairness and equity through participatory AI. We invite researchers, community members, and public sector workers to collaborate on designing, developing, and deploying RAI systems that enhance public sector accountability and transparency. Key topics include raising awareness of AI's impact on the public sector, improving access to AI auditing tools, building public engagement capacity, fostering early community involvement to align AI innovations with public needs, and promoting accessible and inclusive participation in AI development. The workshop will feature two keynotes, two short paper sessions, and three discussion-oriented activities. Our goal is to create a platform for exchanging ideas and developing strategies to design community-engaged RAI systems while mitigating the potential harms of AI and maximizing its benefits in the public sector.more » « less
-
People learning American Sign Language (ASL) and practicing their comprehension skills will often encounter complex ASL videos that may contain unfamiliar signs. Existing dictionary tools require users to isolate a single unknown sign before initiating a search by selecting linguistic properties or performing the sign in front of a webcam. This process presents challenges in extracting and reproducing unfamiliar signs, disrupting the video-watching experience, and requiring learners to rely on external dictionaries. We explore a technology that allows users to select and view dictionary results for one or more unfamiliar signs while watching a video. We interviewed 14 ASL learners to understand their challenges in understanding ASL videos, strategies for dealing with unfamiliar vocabulary, and expectations for anin situdictionary system. We then conducted an in-depth analysis with eight learners to examine their interactions with a Wizard-of-Oz prototype during a video comprehension task. Finally, we conducted a comparative study with six additional ASL learners to evaluate the speed, accuracy, and workload benefits of an embedded dictionary-search feature within a video player. Our tool outperformed a baseline in the form of an existing online dictionary across all three metrics. The integration of a search tool and span selection offered advantages for video comprehension. Our findings have implications for designers, computer vision researchers, and sign language educators.more » « less
-
Affective captions employ visual typographic modulations to convey a speaker’s emotions, improving speech accessibility for Deaf and Hard-of-Hearing (dhh) individuals. However, the most effective visual modulations for expressing emotions remain uncertain. Bridging this gap, we ran three studies with 39 dhh participants, exploring the design space of affective captions, which include parameters like text color, boldness, size, and so on. Study 1 assessed preferences for nine of these styles, each conveying either valence or arousal separately. Study 2 combined Study 1’s top-performing styles and measured preferences for captions depicting both valence and arousal simultaneously. Participants outlined readability, minimal distraction, intuitiveness, and emotional clarity as key factors behind their choices. In Study 3, these factors and an emotion-recognition task were used to compare how Study 2’s winning styles performed versus a non-styled baseline. Based on our findings, we present the two best-performing styles as design recommendations for applications employing affective captions.more » « less
-
Analyzing dance moves and routines is a foundational step in learning dance. Videos are often utilized at this step, and advancements in machine learning, particularly in human-movement recognition, could further assist dance learners. We developed and evaluated a Wizard-of-Oz prototype of a video comprehension tool that offers automatic in-situ dance move identification functionality. Our system design was informed by an interview study involving 12 dancers to understand the challenges they face when trying to comprehend complex dance videos and taking notes. Subsequently, we conducted a within-subject study with 8 Cuban salsa dancers to identify the benefits of our system compared to an existing traditional feature-based search system. We found that the quality of notes taken by participants improved when using our tool, and they reported a lower workload. Based on participants’ interactions with our system, we offer recommendations on how an AI-powered span-search feature can enhance dance video comprehension tools.more » « less
-
Searching unfamiliar American Sign Language (ASL) words in a dictionary is challenging for learners, as it involves recalling signs from memory and providing specific linguistic details. Fortunately, the emergence of sign-recognition technology will soon enable users to search by submitting a video of themselves performing the word. Although previous research has independently addressed algorithmic enhancements and design aspects of ASL dictionaries, there has been limited effort to integrate both. This paper presents the design of an end-to-end sign language dictionary system, incorporating design recommendations from recent human–computer interaction (HCI) research. Additionally, we share preliminary findings from an interview-based user study with four ASL learners.more » « less
-
Caption text conveys salient auditory information to deaf or hard-of-hearing (DHH) viewers. However, the emotional information within the speech is not captured. We developed three emotive captioning schemas that map the output of audio-based emotion detection models to expressive caption text that can convey underlying emotions. The three schemas used typographic changes to the text, color changes, or both. Next, we designed a Unity framework to implement these schemas and used it to generate stimuli videos. In an experimental evaluation with 28 DHH viewers, we compared DHH viewers’ ability to understand emotions and their subjective judgments across the three captioning schemas. We found no significant difference in participants’ ability to understand the emotion based on the captions or their subjective preference ratings. Open-ended feedback revealed factors contributing to individual differences in preferences among the participants and challenges with automatically generated emotive captions that motivate future work.more » « less
-
Despite some prior research and commercial systems, if someone sees an unfamiliar American Sign Language (ASL) word and wishes to look up its meaning in a dictionary, this remains a difficult task. There is no standard label a user can type to search for a sign, and formulating a query based on linguistic properties is challenging for students learning ASL. Advances in sign-language recognition technology will soon enable the design of a search system for ASL word look-up in dictionaries, by allowing users to generate a query by submitting a video of themselves performing the word they believe they encountered somewhere. Users would then view a results list of video clips or animations, to seek the desired word. In this research, we are investigating the usability of such a proposed system, a webcam-based ASL dictionary system, using a Wizard-of-Oz prototype and enhanced the design so that it can support sign language word look-up even when the performance of the underlying sign-recognition technology is low. We have also investigated the requirements of students learning ASL in regard to how results should be displayed and how a system could enable them to filter the results of the initial query, to aid in their search for a desired word. We compared users’ satisfaction when using a system with or without post-query filtering capabilities. We discuss our upcoming study to investigate users’ experience with a working prototype based on actual sign-recognition technology that is being designed. Finally, we discuss extensions of this work to the context of users searching datasets of videos of other human movements, e.g. dance moves, or when searching for words in other languages.more » « less
An official website of the United States government
